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Abstract 


The  objective  of  developing  the  Spectral  Mixture  Model  Algorithm  was  to  provide  some  intelligent 
algorithm  that  could  be  utilized  for  spectral  sensing  in  wideband  receivers.  The  methodology  was 
discussed  initially  in  report  AFRL-RI-RS-TR-2008-266.  The  current  report  is  a  refinement  of  the  technique 
with  the  objective  of  presenting  the  concept  to  a  broader  audience. 

The  Spectral  Mixture  is  a  generalization  of  the  Expectation  Maximization  algorithm.  The  algorithm 
reduces  the  information  divergence  of  two  distributions  by  adjusting  its  parameters.  The  algorithm  can 
be  applied  to  histogram  data  or  sample  points  for  signal  decomposition  of  multimodal  signal  in  terms  of 
mixture  elements.  The  model  was  applied  to  spectral  analysis  with  good  success  in  the  one-dimensional 
case.  To  achieve  better  convergence,  the  algorithm  may  require  the  constraint  of  some  of  the 
parameters  by  imposing  boundary  conditions  or  preventing  changes. 

This  research  explored  some  potential  applications  of  the  algorithm.  These  include:  spectral 
characterization,  speech  compression,  deconvolution  and  image  processing.  The  results  are 
summarized  in  this  report. 


1. 


Introduction 


This  report  is  divided  in  two  parts.  The  first  part  is  a  theoretical  discussion  of  the  development  of  the 
Spectral  Mixture  algorithm  from  the  perspective  of  machine  learning  and  the  second  part  is  a  discussion 
of  the  implementation  of  the  method  for  some  specific  applications. 

The  application  of  interest  is  spectral  decomposition  and  characterization,  also  referred  as  spectral 
sensing.  Given  a  spectral  estimate,  we  would  like  to  model  an  undetermined  number  of  signals  in  terms 
of  the  center  frequency,  bandwidth  and  power  density.  The  signals  are  required  to  follow  some 
predetermined  model  in  the  frequency  domain.  The  method  resembles  a  Gaussian  Mixture  approach 
with  this  important  difference:  the  Gaussian  Mixture  Model  algorithm  requires  statistical  sample  points 
while  the  Spectral  Mixture  Model  requires  histogram  data  as  the  input. 

The  Spectral  Mixture  algorithm  resembles  some  kind  of  entropy  formulation.  By  changing  the  form  of 
the  expression,  one  can  prove  that  the  algorithm  is  an  iterative  process  that  reduces  the  information 
divergence  of  two  parametric  probability  densities  until  the  process  converges  to  some  local  minima. 
Under  certain  specific  conditions,  the  algorithm  reduces  to  Expectation  Maximization,  K-Means, 
weighted  K-Means  and  Parzen  Window  depending  on  the  variable  is  chosen  for  optimization. 

One  of  the  issues  of  the  algorithm  is  caused  by  the  presence  of  an  offset  in  the  distribution.  To 
overcome  this  issue,  one  can  assign  fixed  mixtures  to  model  a  signal  offset  by  keeping  a  fixed  width  or 
variance.  This  issue  is  cover  in  section  3. 

The  algorithm  was  applied  to  signal  detection  and  speech  compression.  The  speech  compression 
method  has  very  low  complexity  and  does  not  require  the  addition  of  voiced/unvoiced  detection  or 
estimation  of  the  linear  prediction  coefficients.  The  price  to  pay  is  a  reduction  of  the  quality  of  the 
speech.  The  compressed  speech  sound  hoarse,  but  this  is  typical  of  vocoder  at  a  low  bitrates. 

The  report  pretends  to  be  an  introductory  tutorial  for  those  who  are  interested  in  this  method.  This 
review  would  serve  as  baseline  for  future  research  in  these  methods. 
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2.  Classification  Algorithms  and  Metrics 


Many  classification  and  machine  learning  algorithms  require  a  data  set  of  observations,  a  parametric 
model  and  optimization  rules.  Through  an  optimization  process,  we  find  the  parameters  that  provide  a 
best  fit  for  the  model  according  to  the  selected  rules.  The  result  of  this  process  is  called  analysis.  The 
analysis  is  usually  the  most  computationally  involved  process  of  classification.  Optimization  methods 
require  processing  many  data  points,  calculating  complex  derivatives,  iterating  many  times  or  evaluating 
complex  functions.  For  such  reason,  a  metric  of  performance  for  the  analysis  process  can  be  described  in 
terms  of  the  number  of  floating  point  operations  that  are  needed  to  find  a  solution. 

Synthesis  is  the  inverse  of  the  analysis  process.  The  synthesis  simply  requires  evaluating  the  model 
using  certain  specified  parameters.  Measuring  the  fitness  of  a  model  to  a  data,  i.e.,  calculating  the  error 
between  the  model  and  the  data  can  be  use  as  a  metric  of  performance.  Minimizing  the  error  is  a  typical 
criterion  for  developing  optimization  rules. 

It  is  often  said  that  the  results  of  the  analysis  are  as  good  as  the  model  used  to  fit  the  data.  If  the  model 
is  not  representative  of  the  data,  then  the  results  are  questionable  or  perhaps  wrong.  Various 
information  criteria  have  been  proposed  such  as  the  Akaike  and  Bayesian  Information  Criterion. 
Developing  a  metric  that  measures  the  fitness  of  different  models  is  beyond  the  scope  of  our  discussion. 


2.1. Mixture  Models 

Suppose  that  a  given  data  set  can  be  modeled  as  a  linear  combination  of  K  density  functions.  A 
parameter  vector  6  =  [a,n,a2, ...]  corresponds  to  the  mixture  probability  a,  the  mean  p ,  the  variance 
o 2  and  possibly  other  high  order  moments  that  control  the  shape  of  the  model.  For  simplicity,  we 
consider  a  vector  with  three  parameters©  =  [a,  p,  a2]  .  The  parametric  model  can  be  express  as  a 
summation  of  mixture  terms  that  follows  a  given  parametric  model  as 

P<y|9>-I«w(» 2.i 

k= 1 


The  maximum  likelihood  approach  provides  a  criterion  for  finding  the  optimal  parameter  vector.  The 
optimization  process  finds  the  parameters  that  maximize  the  likelihood  function  of  the  model  shown  in 
equation  2.2. 
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log{p(y\a,  p,  a))  =  ^  log  (  ^  afcp(y;- 1  gk,  ak) 
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Finding  maximum  likelihood  solutions  using  equation  2.1  has  several  problems.  The  derivatives  of  the 
expression  with  respect  to  its  parameters  produce  complex  terms  with  no  closed  form  solutions.  A 
second  problem  occurs  when  the  variance  approaches  zero;  then  the  normal  distribution  diverges.  This 
is  not  a  big  problem  and  can  be  solved  by  adding  a  small  non-zero  element  to  the  variance  estimate.  In 
addition,  this  case  is  likely  to  happen  when  outliers  are  present.  A  third  problem  is  called  identifiability. 
This  means  that  K  mixtures  can  be  rearranged  in  K\  combinations,  so  there  are  K\  possible  solutions. 
Once  again,  the  problem  is  not  critical.  We  cannot  rely  on  the  index  number  that  identifies  a  mixture 
because  it  can  change  its  value. 

An  equivalent  and  more  convenient  way  to  solve  the  maximum  likelihood  of  a  mixture  is  by  using  the 
Expectation-Maximization  method  shown  in  equation  2.3: 

argmax  log  (p  (y  |  a,  g,  a))  2.3 

e 


where  the  equation  is  subject  to 

K 

'Yjak  =  1  2.4 

fc= l 

The  differentiation  of  log(p(Y\a,  g,  cr))  produces  a  common  term  in  all  the  expressions  for  a,  g  and  a2. 
The  term  is  a  posterior  distribution: 


p{.ak’  gk’ak\y) 


akv{yW’Pk’°k) 

T.i=iOCip{y\ai,gi,a2) 


2.5 


The  evaluation  of  equation  2.5  is  referred  as  the  expectation  step  or  E-Step. 

The  solution  of  equation  2.3  is  obtained  through  a  series  of  iterations  that  calculate  the  posterior 
distribution  and  maximize  the  resulting  expression  in  terms  of  the  parameter  vector.  The  M-Step  or 
maximization  process  consists  of  the  reestimation  of  the  parameter  vector.  In  the  case  of  the  well 
known  Gaussian  Mixture  Model  (GMM)  algorithm,  the  model  is  given  by  a  Gaussian  density  and  the 
optimization  requires  the  evaluation  of  equations  2.6-a  through  2.6-d. 
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2.6-d 


In  order  to  go  from  a  maximum  likelihood  approach  to  the  EM  algorithm,  we  note  that  the  maximum 
likelihood  algorithm  provides  knowledge  of  the  mixture  through  the  prior  distribution  p(afe,  pfe,  °k2|  y)- 
Instead  of  ignoring  this  term,  we  decide  to  include  this  knowledge  in  our  advantage  by  reformulating 
equation  2.2  such  that  the  summation  term  does  not  appear  inside  the  logarithm.  This  is  desirable 
because  it  facilitates  solving  the  equations  as  we  will  see  soon. 

The  algorithm  can  be  reformulated  by  creating  a  dummy  variable  z.  The  variable  z  acts  like  a  tag  that 
provides  the  index  number  of  a  given  mixture.  The  index  information  is  not  available  directly,  so  we  call 
this  the  missing  random  variable. 


K 

p(y.z  =  k;§)  =  ^  «iP(y;-  | Zj  =  i; /zt>  of)  2.7 

k= l 

Some  knowledge  of  the  label  z  is  available  through  p(z  =  i|y,  of)  =  p(a;,p;,  of  |y).  This 
knowledge  is  supplied  into  the  equation  2.2  by  estimating  the  conditional  expectation  of  the  log- 
likelihood.  The  new  expression  is  called  the  auxiliary  function  L.  The  optimization  rule  consists  of 
finding  the  parameter  vector  that  maximizes  the  auxiliary  function, 

J  K 

£(0.  Sold)  =  ^  p(yj\Zj  =  k\  90ld)  ^  log  ( p{yj,Zj  =  k;  0))  2.8 

7=1  k= 1 

conditioned  to  Xk=i  P(zy  =  k;  §)  =  1. 

The  E-Step  consists  of  the  evaluation  of  p(z  =  r  |y,  cck,gk,  of).  The  M-Step  is  the  maximization  of  the 
auxiliary  function: 


^L{e,eold)  =  o  2.9 
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2.2.  Spectral  Mixtures  1-D 


The  Mixture  Model  approach  operates  in  a  set  of  statistical  samples  and  produces  the  parameters  of  a 
distribution  that  fit  a  given  model  in  an  optimal  sense.  A  similar  approach  was  developed  in  project 
AFRL-RI-RS-TR-2008-266  with  the  purpose  of  analyzing  histograms  instead  of  statistical  samples.  The 
objective  was  to  create  a  method  for  spectral  sensing,  i.e.,  a  method  for  characterizing  the  frequency 
spectrum  of  telecommunication  signals. 

The  spectral  model  can  be  derived  from  the  quantization  of  the  feature  space.  Given  a  data  set,  its 
quantized  samples  generate  a  histogram.  The  histogram  is  simply  a  pair  of  cells  and  sample  counts.  The 
cell  indexes  will  substitute  the  data  set  in  the  EM  method.  The  sample  count  will  be  reflected  as 
exponents  in  the  likelihood  function. 

First,  we  define  continuous  variable  {y  |y  e  M),  a  discrete  variable  an  arbitrary  {f\fe  Zjand  a 
quantization  function  Q(y)  that  maps  y  into  a  discrete  variable  /: 

Q:{f  =  Q(y)\yeR,feZ}  2.10 

Let's  assume  that  finite  set  of  statistical  samples  {yM=1  is  sorted  in  ascending  order  such  thaty;-  < 

yk  /or  j  <  k.  The  quantized  process  forms  a  new  set  (y/)}^_1-  We  are  interested  in  counting  the 

number  of  repeated  values  in  the  new  set.  We  define  S  as  the  count  of  samples  for  a  quantization 
index  /  as, 


S{f)  =  Count{f :  /  =  {Q(yj)}JJ=1}  211 

The  goal  is  to  investigate  the  form  of  the  auxiliary  function  (equation  2.8)  under  quantization  of  the 
sample  space.  Quantization  always  produces  lost  of  information.  In  this  case,  specific  information  of  the 
missing  variable  z  is  expected  to  be  lost  for  individual  samples.  Under  quantization,  all  samples  falling 
under  the  same  cell  i  must  share  the  same  index  number.  We  designate  a  new  variable  z  to  represent  a 
tag  for  group  of  samples  within  the  same  histogram  bin.  We  also  impose  the  probability  function  to  be 
identical  for  all  samples  inside  a  cell. 


P(z  =  0  =  P(zj  =  0  2.12 

The  likelihood  p(/|z  =  /c;  0)  will  be  defined  as  an  average  of  the  probability  models.  This  is  done  by 
adding  all  likelihoods  with  a  common  quantization  index  /  and  dividing  them  by  the  number  of  samples 
in  the  histogram  bin. 

J 

p(f\z  =  k-J)  =  p(/  =  Q{yj}\zj  =  k>  0)  2-13 
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The  posterior  probability  p(y;|z;-  =  /e;  0oid )  remains  invariant  under  the  transformation  of  equation 
2.13. 


p(z  =  k\f;90ld) 


UkP{f\z  =  k;  Gold) 

T.k=iakP{f\z  =  k;6old ) 


2.14 


Finally,  let's  rewrite  the  auxiliary  function  under  quantization.  Each  repeated  samples  modify  the 
likelihood  expression  by  rising  each  term  to  the  S(fm )  power: 

M  K 

^{0 ,  0o(d)  —  ^  ^  —  k  |  fm)  0 old)  ^  ’  log  {p^fm<  % m  —  k',  0)  ^  2  15 

m= 1  fe=l 

subject  to  the  constraint  YJk=\P{^j  =  k)  0)  =  1. 

It  was  found  convenient  to  rename  the  parameter  vector  using  variables  associated  with  the  frequency 
spectrum,  which  in  a  sense;  it  is  nothing  less  than  a  histogram.  The  terms  mean,  variance  and  mixture 
probability  will  be  replaced  by  center  frequency  fc,  statistical  bandwidth  b,  and  power  composition  p 
respectively.  The  parameter  vector  is  rewritten  as: 

9  =  [p,fcM  2.16 


2.2.1.  Maximization  of  the  Expectation 


This  section  covers  the  discussion  of  the  maximization  process  for  the  one  and  two-dimensional  case. 
The  one-dimensional  case  initial  motivation  was  to  analyze  the  frequency  spectrum  to  characterize 
signals  above  the  noise  floor.  The  motivation  for  the  two-dimensional  case  analysis  was  to  analyze 
signals  in  the  time-frequency  domain.  The  analysis  resulted  in  a  rudimentary  and  perhaps  inefficient 
image  detector.  The  results  of  the  findings  will  be  summarized  in  the  present  section. 

Let's  now  define  a  family  of  Gaussian-like  density  functions  as, 


p(/|z;  0) 


2.17 


For  N  =  2,  the  distribution  becomes  Gaussian  with  mean  fc  and  variance  b2/8.  The  function  T(V)  is  the 
Gamma  function.  The  model  has  been  plotted  for  several  values  of  N  in  Figure  1. 
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Figure  1  Parametric  model  of  equation  2.17  as  a  function  of  N 


Maximizing  the  expectation  results  in  equations  2.18-a  through  2.18-d. 


M 


;£(0, 90ld)  =  Y,  p{zm  =  k\fm ; 9oM)s(fm )  —  ^  (/m^  Y")  =  0 


dfc.i 


m=  1 


(/m-/c)V  &k/2 


2.18-a 


M 


m=  1 


( 

N 

f/m  -  /c,kV 

V  bk 

ibk/ 2) 

l  afe/2  J 

JV 


a 


£(0.  a0;d)  =  0  ->  P(zm  =  k)  = 


p(_zm  ~  k\fm',  9 old)  S(fm) 
Ef=iSn=iP(zn  =  i\fn)  90ld)  S(fn) 


2.18-c 


The  optimal  parameter  vector  9opt  =  [p0pt>fcopt>b0pt]  's  obtained  by  implementing  the  following 
steps: 

Algorithm: 

1.  E-Step:  Evaluate  p(zm  =  k\fm;  90ld ) 

2.  M-Step:  Evaluate  9  =  [p,fc,  b]  using  equations  2.18-a  to  2.18-c 

3.  Verify  convergence  by  comparing  9  and  9otd 
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2.2.2.  Spectral  Mixtures  2-D 


An  interesting  development  of  the  spectral  mixtures  is  when  extending  the  concept  for  the  two- 
dimensional  case.  We  are  going  to  show  how  quantities  like  as  center  of  mass,  standard  deviation  and 
regression  line  arises  from  the  maximization  of  our  Gaussian-like  model. 

The  parametric  model  used  is  a  Clipped  Gaussian  distribution  in  two  dimensions.  The  distribution  is 
controlled  by  a  parameter  N  that  modifies  the  decaying  slope  of  the  density.  We  also  introduce  a  set  of 
affine  transformations  modified  by  the  following  parameters:  translation,  scale,  shear  and  rotation.  In 
our  case,  shear  is  constrained  by  scale  because  the  intention  of  this  particular  model  is  to  characterize 
modulated  signals  in  the  joint  time-frequency  domain. 

Our  2-D  model  defines  an  affine  transformation  Q,  a  translation  vector  jl  and  a  coordinate  vector  x.  The 
affine  transform  Q  includes  a  shear  matrix  [ Sh ],  a  scale  matrix  [Sc]  and  a  rotation  matrix  [R], 


Q(x  —  /I)  =  [Sft]-1[Sc]-1[i?]-1(x  —  q)  2.19 


The  parameters  of  the  matrices  are  the  angle  of  rotation  0,  the  scale  factor  sx  in  the  x  coordinate  and 
the  scale  factor  sy  in  the  y  coordinate.  The  parameter /I  is  a  translation  vector.  The  specific  form  of  the 
matrices  is  shown  in  equations  2.20  through  2.22. 


f?(0)  = 


COS  (0) 
sin  (0) 


— sin(0) 
cos  (0) 


2.20 


Sc  = 


SX 

0 


01 


2.21 


The  model  provides  sufficient  parameters  for  characterizing  telecommunication  signals  in  the  time- 
frequency  domain.  The  shear  models  a  parallelogram  shape  which  is  characteristic  of  chirp  signals  in  the 
joint  time-frequency  domain. 


Sh  = 


— -tan  (0) 

Sv 


o- 

1 


2.22 


A  new  vector  is  constructed  by  raising  each  coordinate  to  the  N  power.  Equation  2.23  is  a  non-linear 
transformation. 


q  =  {Q(x-H)}N  2.23 
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The  resulting  exponential  function  resembles  a  rectangular  surface  for  N  =  4  as  shown  in  Figure  2. 


p(x\z;  6) 


1 

V0) 


exp{—qTq} 


2.24 


Figure  2  Shape  of  the  parametric  model  in  2  dimensions 
of  equation  2.20 

The  parameter  vector  is  defined  in  terms  of  these  geometrical  parameters  as, 

0  [p,  Px>  Py*  ^X’  Syi  0]  2.25 


where  p  is  the  ratio  of  the  volume  of  one  mixture  over  the  entire  volume. 

Normalization  requires  integrating  the  whole  surface.  For  N  =  4,  the  volume  was  approximated  using 
equation  2.26.  The  volume  of  the  distribution  is  a  function  of  to  the  scales  and  the  order  of  the 
distribution  N.  For  our  purposes,  N  is  constant  so  the  volume  can  be  expressed  as  a  function  of  the 
scaling  parameters. 


1/(0) 


=  JJ  exp{- 


■ qTq )  dx  dy  »  c  •  sx  ■  sy 


2.26 


The  auxiliary  function  L  keeps  the  same  appearance.  The  variable  fm  is  replaced  with  vector  x: 

M  K 

Odd)  =  ^  p(Xn  =  k|xm;  eold )  ^  log  (p{xm,zm  =  k;  2.27 

m= 0  fe= 1 


subject  to  the  constraint  Efe=iP(zm  =  k;6~)  =  1. 
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Maximizing  the  expectation  requires  finding  the  roots  of  the  derivatives  of  the  auxiliary  function.  The 
derivatives  can  be  found  by  using  Mathematica  or  other  symbolic  math  application. 


For  the  translation  [ix,  the  derivatives  are  given  by  equation  2.28. 
-L(9,  90ld)  =  ^ 

x  y 


dfi 


~£-[9,9 oid )  ^  ^  '  p{_zm  k\Xm>ym>  Sold)  '  S(.Xm>ym) 


-4 


N  I 


(x  -  /z*)sec  (0)\  f{x-  nx)  cos (0)  +  (y  -  p y)cos  (0) 


+ 


2iV\ 


2.28 


—  ln(k  •  sx  •  sy 


)  =° 


Some  simplification  can  be  accomplished  by  using  N  =  1  as  an  approximation. 

—  k\xm, ym',  9 0id )  •  Sixm, ym)  •  X 

fix  = - — - 5 — -  2.29 

YixYjyPizm  ~  k\xm,ym>  @  old)  '  ^ixm>ym) 


This  is  an  expression  equivalent  to  the  center  of  mass  in  the  x  coordinate.  A  similar  expression  was 
found  for  the  y  coordinate. 


Xx2yP(zm  ^1  Xm>ym>@old)  '  SiX-mi  Vm)  '  y  2  30 

2%2yP(^m  —  k\Xm>ym>  @old)  '  S ixrmym ) 


For  the  scaling  sx,  the  maximization  of  the  auxiliary  function  produces  a  closed  form  solution. 


Sy  - 


fY.xY.yp(.zm  =  fc[xm,ym;  9oid )  •  s(xm,ym )  •  {21+2WjV(sec(0))2W(x  -  fix)2N] 

\  YixYjyP(zm  ~  k\xm,  ym>  @  old)  '  S(xm>ym) 


.  1/(2  N) 


2.31 


In  a  similar  manner,  the  scaling  sy  is  given  by, 


^Z*Zyp(zm  =  k|xm,ym;  90ld)  •  S(xm,ym)  •  {2 1+2NN  ((x  -  nx)  Sin(0)  -  (y  -  py)  cos(0))} 

YixYiyP{_zm  —  k\Xm>ym>  @old)  '  Sixm,ym) 


1/(2  N) 


2.32 
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These  expressions  reduce  to  the  standard  deviation  when  the  order  of  the  distribution  is  one.  In  the  y 
coordinate,  the  standard  deviation  is  calculated  along  the  regression  line  of  the  data  given  by(x  — 
Px)  sin(0)  —  (y  —  py)  cos(0)  =  0.  One  can  argue  that  the  standard  deviation  provides  a  rough 
approximation  of  the  scales  sx  and  sy  during  our  optimization  process.  This  approximation  was  verified 
during  simulations. 

For  the  angle  of  rotation  0,  the  maximization  of  the  auxiliary  function  results  in  equation  2.33. 

IL  p(zm  k\ Xm,ymj  @old)  '  S(xm,ym) 

x  y 

■  {-2 N  ((w,|  •  (p  -  Po))2W  '(Wi  •  (p  -  Po))/-5yW)  +  ((a:  -  Px) sec  (0))2Wtan  (0)} 

2.33 

The  vectorslv||,  w±,  p  and  p0  are  vectors  the  principal  components  of  the  data  as  illustrated  in  Figure  3. 
The  data  is  distributed  inside  a  parallelogram  that  is  at  a  distance  p  —  p0  from  the  origin.  The  slope  of 
the  principal  component  equals  tan  (0)  and  runs  parallel  to  the  regression  line  W| | .  The  dot  product  of 
the  distance  vector  and  the  principal  component  wL  defines  the  regression  line. 

W||  ■  (p  -  p0)  =  -  sin (0)  (x  -  Px)  +  cos (0)(y  -  py)  =  0  2.34 


Figure  3  Distribution  of  data  points  of  the  selected  model: 
Two  principal  component  vvx  and  Wy  from  distribution 
of  equation  2-20. 
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The  regression  line  approximation  is  verified  when  N  =  1.  The  slope  of  the  regression  line  becomes: 


tan  (0)  = 


Xx2yP(zm  —  k\xm,  ym>  @old)  '  S(xm,ym)  '  (.x  Px)  '  (y  Py) 
2xSyP(^m  —  k\  Xm>ym’  @ old)  '  ^^xm>ym)  '  (y  ~  Py) 


2.35 


The  optimal  parameter  vector  8opt  is  obtained  by  implementing  the  following  steps. 
Algorithm  2-D: 

1.  E-Step:  Evaluate  p(zm  =  k\xm,ym;  0oid) 

2.  M-Step:  Evaluate  6  =  [p,fix,[iy,sx,Sy,<p]  using  equations  2.25  to  2.31. 

3.  Verify  convergence  by  comparing  6  and  0oid 


2.2.3.  Relationship  between  Spectral  Mixtures  and  Parzen  Windows 


Parzen  Windows  is  a  non-parametric  method  used  for  the  estimation  of  probability  density  function.  In 
machine  learning,  a  non-parametric  method  implies  that  there  is  no  control  over  the  model.  The  most 
common  way  for  adjusting  the  parameters  of  a  Parzen  Window  is  by  trial  and  error.  The  best  estimate 
comes  from  reducing  the  error  between  the  data  histogram  and  the  model.  This  can  be  done  by  visual 
inspection.  Other  approaches  use  neural  networks  to  accomplish  this  task. 


The  method  models  discrete  histogram  bins  by  means  of  kernel  functions.  In  the  simplest  case,  it  defines 
a  rectangular  window  of  unit  area  and  width  h.  The  density  p(x;)  is  estimated  using  the  window 
function  w(x)  shown  in  Figure  4  as  blue  rectangles,  the  number  of  data  points  xt  that  falls  under  the 
window  and  the  area  (or  volume)  of  the  window. 


PO; ) 


samples  inside  the  bin  i 
total  samples  ■  area 


2.36 


PO; ) 


K  ■  h 


2.37 


w(x) 


i  1 

1  for  -  -  <  x  <  - 

0  otherwise 


2.38 
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Figure  4  Concept  of  Non-Parametric  Parzen  Windows 

The  window  w(x)  is  known  as  the  kernel.  It  does  not  need  to  be  a  rectangular  window.  Parzen  proved 
that  it  can  take  the  form  of  equation  2.39  or  other  shapes  as  long  as  the  following  conditions  are  met: 


dx  >  1, 


2.39 


|  w(x)  |  <  oo,  2.40 


lim  |  x  •  w(x)  |  =  0.  2.41 

X~>oo 


It  is  possible  for  this  type  of  kernel  to  converge  to  the  original  density  function  when  h  -»  0  for  a  large 
number  of  windows  is  a  normalized  density  function: 


1  2.42 


The  Spectral  Mixture  algorithm  is  also  based  in  quantization  of  the  sample  space.  This  similarity  between 
Parzen  Windows  and  Spectral  Mixtures  make  one  wonder  if  there  is  a  relationship  between  both 
algorithms.  In  the  spectral  mixture  case,  the  approximate  number  of  samples  in  bin  xm  is  given  by: 

M 

s(^  =  IH^r)  243 

i= 1 

These  samples  modify  the  likelihood  in  the  form  of  exponents  as  shown  previously. 
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In  the  Parzen  Window  method,  the  bandwidth  bt  and  the  center  frequency  xt  are  at  fixed.  The 
minimization  process  is  done  with  respect  to  the  parameter  pkew  =  P(zm  =  k).  We  also  impose  that 
the  number  of  samples  and  mixtures  are  the  same  (M  =  K).  These  assumptions  are  incorporated  in 
equations  2.44  and  2.46  as: 


M 


£(Pnew.Pold )  =  ^  p0m  =  k\xm;  poid)  ^  log(j)(xm,  zm  =  k;  pnew)S{Xm)) 


m= 1 
K  r,new 


■ 


subject  to  the  constraint  pkew  =  1  and  posterior: 


p(zm  =  k\xm-,plld)  = 


p{xm\zm  =  k',Pkd)p, 


old\  ^ old 
k 


The  optimal  solution  for  p  is  given  by  equation  2.18-c: 

Im=iP(zm  =  k\xm-,  p£ld)  S(xm) 


Prw  = 


Sf=lSn=lP(zn  =  ^1  Xn;PkM)  S(xn) 


2.46 


2.44 


Substituting  equations  2.45  in  2.46  result  in  2.47. 


M 


nnew  _ 

rk  y  m 

Ltn= 


j- _ y  pi 

1S(xn)  2^Y.i= i 


P(.xm\zm  =  k-,plld)pold 


P(xm\zm  =  l>Pkd)Pl 


.  nold\r,old  y  m 


S(xm)  2.47 


The  likelihood  has  the  same  form  as  the  normalized  window  function  in  2.37  as: 

(xm  ~  xk\  „old 


„new 

Pk 


S(xm)w  (- 


m  crY 

1  ^—i  ■-’v'-m. 

.•.S(xn)  4_i  y K 

1  v  nJ  m=  1  Zul= 1 


h  )  Pk 

iLm  / 


"tt)'’; 


xm  xl\  nold 
M 


2.48 


The  convergence  of  the  parameter  pkew  -»  p *k  suggests  that  1/M  is  a  possible  solution.  This  solution  is 
in  agreement  with  the  Parzen  Window  method. 


Pk 


1 


4,n= 1 


S(xn) 


2.49 


The  sum  over  all  pk  must  be  equal  to  one.  This  was  our  original  constraint  in  equation  2.39,  which  is  also 
true  for  the  equation  2.50.  This  shows  that  the  spectral  mixture  algorithm  reduces  to  Parzen  Window 
when  the  parameters  are  fixed  and  the  parameter  pk  converges  to  the  specified  value. 
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1 


2.50 


yM 
Z-m=  1 


S(xn) 


The  Parzen  Window  may  require  finding  optimal  estimates  of  the  bandwidth  hM.  The  Spectral  Mixture 
approach  already  provides  a  way  to  readjust  the  bandwidth  and  distances  between  mixtures. 


2.2.4.  K-Means  and  Spectral  Mixtures 


The  K-Means  algorithm  is  a  special  case  of  the  Expectation  Maximization  method  for  a  mixture  of 
Gaussian  densities,  also  known  as  Gaussian  Mixture  Models  (GMM).  The  difference  between  the  K- 
Means  and  the  GMM  is  that  K-means  makes  a  deterministic  assignment  of  samples  to  a  cluster  (or  hard 
decision)  while  the  GMM  makes  a  probabilistic  assignment  or  soft  decision  using  a  Gaussian  distribution 
instead  of  a  rectangular  window  function. 

Going  from  a  soft  decision  to  a  hard  decision  is  equivalent  to  fixing  the  variance  and  allowing  finding  the 
limit  as  it  goes  to  zero.  The  maximization  of  the  auxiliary  function  2a2  -£(p,  poid)  is  done  by  with 
respect  to  the  centroids  p k . 

M  K  2 

g2  ■  £(p,  gold)  =  ^  p(z  =  k\y,  gold)  ^  ^  ^  +  G2  •  log  ~  °2  •  login)  2.51 

m= 1  k= 1 


lim 

<72->0 


{ 


2.52 


As  the  variance  goes  to  zero,  the  conditional  probability  p(z  =  k\f;  60id)  converges  to  the  unity  only  at 
the  data  point  that  is  closest  to  the  centroid  gk. 


lim  p(z  =  k\y;  gold)  =  y(z  =  k\y)  = 
a^O  L  1 


for  y-gk*  min  (y  -  pfe) 
for  y  -  Pfc  =  min  (y  -  pfe) 


2.53 


The  auxiliary  function  becomes: 

/v  .. 

2a2  -£(p,poid)  =  II  -y(z  =  /c|y)(y  —  pfe)2  +  constants  terms  2.54 


M  K 


m=  1 fe= 1 
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The  extending  K-Means  to  Spectral  Mixtures  is  very  simple.  The  distance  between  the  samples  /  and  the 
bins  fc  k  are  weighted  by  the  histogram  count  S(f).  The  resulting  algorithm  finds  K  centers  of  mass  of  a 
histogram. 


M  K 

2er2  ■  £(fofc,oid )  =  Y  X  2Y ^  =  ~  /c,fc)2  +  constants  terms  2.55 

m= 1 k= 1 


Table  1  Spectral  Mixture  Cases 


Spectral  Mixture  Case: 

Algorithm: 

Case  (/)  =  1  : 

Expectation  Maximization 

Case  S(/)  =  1,  o  -*  0 

K-Means 

Case  S(f)  =  const,  f  =  const,  a  = 
const. 

Parzen  Window 

2.3.Spectral  Mixtures  and  Kullback-Leibler  Divergence 

As  we  have  seen,  the  Spectral  Mixture  approach  can  be  considered  as  a  general  case  of  EM,  K-Means 
and  Parzen  Window.  In  addition,  we  can  observe  that  the  auxiliary  function  looks  somewhat  similar  to 
some  sort  of  entropy. 

The  Spectral  Mixture  algorithm  will  be  modified  without  altering  the  final  result.  Instead  of  using  an 
arbitrary  histogram,  we  decide  to  normalize  S(xm )  and  expressed  it  as  a  probability  density 
function  p(fm)  =  S(xm)/ f  S(x)dV.  The  normalization  constant  f  S(x)dV  and  the  constant 

expression  I"=0p(zm  =  k\xmi  90id)Pixm)  ELi lo9  (p(zm  =  k\xmi  B0id)P(xm))  do  not  affect  the 
maximization  process  with  respect  to  6. 

M  K 

Bold)  =  Y  P^m  =  k\*m'  ® old)P(Xm )  Y  l°9  {p(.*rn\Zm  =  k;  6)p(zm  =  fc)) 

m= 0  fe= 1 

m  k  2  56 

+  Y  P^m  =  k\*m’  Y  l°9  (Kfm  =  k\Xm)  0O!d)p(*m)) 

m= 0  k= 1 


The  resulting  expression  is  the  negative  of  the  Kullback-Leibler  divergence  between  distributions 
q(x,  z;  9old)  =  £  p{x\z;  90ld)p(x)  and  p(x,  z;  §)  =  Y.p(x\z;  d)p(z). 


Dkl  ( q(x,z ;  9old)  \  \  p{x,z\  0))  =  -L(§,90ld)  2.57 
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The  Spectral  Mixture  algorithm  minimizes  the  divergence  of  the  distributions  corresponding  to  a 
selected  model  and  the  data  distribution.  Upon  successive  iterations,  the  probability  of  the  data 
predicted  by  the  model  approximates  the  true  distribution. 

8new  =  arg  mjn  DKL  (q(x,  z ;  dold)  \  \  p(x,  z;  0))  2.58 

The  minimization  process  requires  the  correct  model  to  achieve  the  following  convergence: 

K 

P(*m)  =  p(xm\zm  =  k;  d)p(zm  =  k)  ->  p(xm)  2.59 

fe= l 

Other  divergence  formulations  can  be  used  instead  of  equation  2.58.  The  minimization  of  the 
parametric  Renyi  alpha  divergence  [9]  (equation  2.60)  will  be  based  on  the  log  of  the  ratio  between  the 
model  and  the  data.  This  is  a  maximum  likelihood  formulation  analogous  to  the  equation  2.2  that 
originated  all  this  theory. 

£>oo  (q(x,  z;  8old )  |  |  p(x,  z;  fl))  =  log  (  P{X’ Z’^  )  >0  2.60 

\q[x,z-,Gold)J 
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3.  Implementation  of  the  Spectral  Mixture  Algorithm 


3.1.Spectral  Sensing 

The  Spectral  Mixture  model  algorithm  was  implemented  in  Matlab  using  1-D  Gaussian  densities  and 
provided  as  an  example.  Some  rules  were  implemented  to  control  the  convergence  of  the  mixtures  in 
the  M-Step.  For  instance,  if  the  variance  is  less  than  a  minimum  value,  then  reset  the  variance  to  a 
predetermined  value. 

The  first  case  is  an  example  of  a  pure  Gaussian  Mixture  process  with  N  =  2  in  equation  2.17.  Four 
Gaussian  mixtures  were  generated  with  noise.  (See  Figure  5)  A  total  of  ten  mixture  elements  were 
added  and  the  algorithm  iterated  20  times. 


Analysis 


Figure  5  Analysis  of  a  histogram  of  Gaussian  mixtures 


Convergence  to  the  desired  values  was  achieved  in  the  first  5  iterations  approximately  as  shown  in 
Figure  6. 
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Mixtures  and  Original 


Figure  6  Convergence  of  the  Gaussian  mixtures 


The  overall  contribution  of  all  the  mixtures  has  an  average  squared  error  of  0.009.  The  synthesis  shown 
in  Figure  7  resembles  the  original  distribution.  One  of  the  explored  ideas  was  the  design  of  a  simple 
vocoder  that  encodes  the  values  of  the  mean  and  variances  and  then  synthesizes  the  speech.  This  will 
be  discussed  in  section  3.2. 


Synthesis 


Figure  7  Synthesis  of  the  Spectral  mixtures 

The  spectral  mixtures  work  for  clipped  Gaussian  distributions  generated  from  equation  2.17.  In  this 
case,  the  feasibility  of  using  the  algorithm  for  spectral  sensing  applications  was  explored.  For  these 
experiments,  a  polyphase  filter  was  used  to  provide  a  spectral  average  of  generic  MPSK  and  FPSK 
signals.  Figure  8  shows  two  MPSK  signals  under  SNR  below  15  dB. 
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Iteration  26 
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Figure  8  Synthesis  of  the  frequency  spectrum 

The  algorithm  achieves  convergence  when  one  of  the  mixtures  is  constrained  from  moving  and 
expanding:  fixed  center  frequency  and  fixed  bandwidth.  The  trick  forces  the  mixture  to  converge  to  the 
noise  floor  of  the  spectrum  while  the  other  mixtures  converge  to  the  signals  above  the  noise  floor. 

An  attempt  to  implement  a  two-dimensional  Spectral  Mixture  revealed  that  the  process  is  slow, 
computationally  intensive  and  does  not  converge  so  easily  to  the  desired  solutions. 


Figure  9  Convergence  of  a  2-dimension  Spectral  Mixture,  45dB 


Figure  9  shows  two  BPSK  signals  switching  from  on  and  off  states  in  a  short  time  period.  The  vertical  axis 
represents  frequency  and  the  horizontal  one  represents  time.  The  SNR  of  the  signals  is  above  45  dB. 
The  figure  to  the  left  shows  the  centroids  after  convergence.  The  right-side  figure  shows  the  product  of 
the  posterior  distribution  p(zm  =  k\xm;  90[d)  with  the  spectral  distribution  S(f).  The  posterior 
distributions  act  as  a  window  that  block  the  undesired  signals  and  only  allow  the  portion  of  the 
distribution  S(f)  that  contributes  to  a  mixture. 
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Under  more  severe  SNR  and  without  the  usage  of  a  fixed  mixture  that  converges  to  the  noise  floor,  the 
method  breaks.  The  centroids  are  attracted  by  the  areas  of  high  density.  In  some  cases,  a  mixture  can 
converge  to  large  of  portions  of  the  noise  floor  as  shown  in  Figure  10.  The  addition  of  the  fixed  mixture 
shows  very  little  improvement. 


3.2.Simple  Vocoder 

Vocoders  are  a  set  of  coders  specifically  designed  for  voice.  Basically  there  are  three  types  of  speech 
coding:  waveform  coding  (such  as  ADPCM),  source  or  parametric  coders  (vocoders)  and  hybrid  coders 
which  are  a  combinations  of  the  first  two.  One  important  difference  among  these  approaches  is  the 
quality  and  achievable  bitrate.  The  hybrid  coders  have  the  best  quality  and  performance,  but  also  they 
are  the  ones  with  the  highest  complexity. 

Vocoders  rely  on  speech  synthesis  and  psychoacoustic  models  for  speech  synthesis.  The  vocoders  rely 
on  analysis  and  synthesis  that  produce  speech  that  is  perceptually  acceptable.  The  synthetic  speech 
does  not  have  to  necessary  match  the  original  signal.  The  synthetic  speech  is  usually  as  good  as  the 
model.  The  model  usually  separates  a  speech  signal  into  voiced  or  periodic  speech  and  unvoiced 
speech.  The  signals  are  also  separated  in  the  excitation  and  the  vocal  track  response.  The  problem  of 
characterizing  the  vocal  track  is  related  to  linear  prediction.  The  linear  prediction  problem  is  equivalent 
to  a  system  identification  problem  and  it  can  be  proved  that  the  optimal  linear  prediction  coefficients 
that  minimize  the  error  are  exactly  the  coefficients  of  autoregressive  model  that  generates  speech.  [2] 

The  Multiband  Excitation  Vocoder  is  an  "analysis  by  synthesis"  method  which  means  that  the  synthetic 
speech  is  compared  with  the  original  to  get  an  error  signal.  [8]  The  method  minimizes  the  error  by 
adjusting  the  spectral  envelope.  The  model  also  ignores  the  phases  of  the  spectral  response  as  shown  in 
equation  3-1.  The  MBE  defines  a  criterion  error  to  be  minimized.  The  error  is  just  the  difference 
between  the  spectral  power  of  the  original  S(cj)  and  the  synthetic  signal  Sw(a>)  which  is  the  product  of 
the  spectral  response  given  the  vocal  track  and  the  excitation  Ew(a)). 
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Error  =  ^ J  \S(a))\-\Hw(a))\\Ew(a))\\2da) 


3.1 


The  Spectral  Mixture  provides  a  way  to  analyze  and  synthesize  the  spectral  response  |S(a>)|  of  the 
speech.  By  using  a  nearly  perfect  reconstruction  filter  bank  and  Spectral  Mixtures  we  can  encode  and 
decode  speech  as  shown  in  Figure  11.  The  approach  assumes  that  frequency,  bandwidth  and  intensity 
of  spectral  mixtures  can  be  used  to  encode  speech.  The  parameters  can  be  used  to  reconstruct  the 
synthesized  version  for  approximating  the  spectrum. 


Figure  11  Diagram  of  the  Proposed  Vocoder  Implementation 

Instead  of  estimating  the  vocal  track  and  the  excitation,  the  signal  is  represented  as  a  set  of  three 
parameter  vectors:  the  center  frequency,  bandwidth  and  power  of  the  spectral  response.  Synthesized 
speech  using  Gaussian  distribution  results  in  Gaussian  pulses  in  a  time  with  a  width  that  is  inversely 
proportional  to  the  bandwidth.  As  an  alternative  to  Gaussian  pulses,  we  can  synthesize  an  approximate 
the  time  response  with  sine  functions  shown  in  equation  3.2. 

sinc(nb(f  —  /c)) 
nf 


p(/|z;0)  * 


The  filter  banks  were  implemented  in  Matlab®.  Figure  12  shows  the  decomposition  and  synthesis  of  a 
256-channel  filterbank.  The  filter  uses  a  Kaiser  window  with  30  dB  of  attenuation  and  transition 
bandwidth  of  5  percent.  These  parameters  were  chosen  to  preserve  the  reconstructed  signal  from 
distortion. 

The  advantage  of  this  approach  is  the  little  complexity  in  the  implementation.  The  scheme  does  not 
need  voice/unvoiced  detection. 


22 


Original 


-0  4 - 1 - 1 - 1 - 1 - 1 - i - 1 

0  0.5  1  1.5  2  2.5  3  3.5 

x  104 

Reconstructed 


Figure  12  Original  and  Decoded  Speech 


Figure  13  shows  the  approximation  of  the  voice  and  unvoiced  speech.  The  algorithm  provides  a  crude 
approximation  in  both  cases. 

The  algorithm  is  implemented  using  matrices.  The  computation  of  the  posterior  distribution  in  the  E- 
Step  requires  0(N  *  M  *  /)  floating  point  operations,  where  N  is  the  number  of  data  points,  M  is  the 
number  of  mixtures  and  /  is  the  number  of  iterations  used.  In  this  particular  example  the  number  of 
floating  points  operations  is  approximately  0(N  *  M  *  /)  =  256  *  10  *  40  =  102400. 

The  achieved  compression  is  as  follows:  The  speech  audio  file  is  57KB  sampled  at  8kHz.  The  speech 
separated  into  its  frequency  components  using  a  256-channel  filter  bank.  The  spectral  mixture 
approach  uses  15  Gaussian  Mixtures  and  produces  three  outputs:  center  frequency,  bandwidth  and 
power  composition.  The  data  is  saved  using  the  following  fields: 

1.  1  byte:  center  frequency,  the  data  range  is  (0-255) 

2.  1  byte:  bandwidth,  the  data  range  is  (0-255) 

3.  1  byte:  power  composition,  the  data  range  is  (0-100) 

4.  8  bytes:  total  power  per  window  of  data,  the  data  range  can  currently  take  floating  point 
numbers  of  double  precision.  This  value  is  necessary  because  the  spectral  mixture 
algorithm  only  takes  normalized  distributions.  The  value  could  be  reduced  to  4  bytes. 
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The  57  KB  file  produces  is  compressed  to  a  4.27  KB  file.  The  compression  ratio  is  12.12.  The  file  plays  for 
3  seconds.  The  original  bit  rate  is  57K  *  8  /3  =  152  kbps.  The  compressed  audio  rate  is  4.27K  *  8/3  =  11 
kbps. 

For  having  a  simple  architecture  the  proposed  vocoder  appears  to  have  a  fair  performance  with  some 
hoarse  and  buzzing  sound.  Of  course,  there  are  state  of  the  art  vocoders  that  work  much  better  at 
lower  rates,  for  example,  the  MBE  vocoder  works  at  4.15  kbps  and  the  MELP  works  at  2.4  kbps.  [10] 
Nevertheless,  these  vocoders  have  sophisticated  processing  and  it  is  not  the  intention  of  this  project  to 
beat  the  current  state  of  the  art. 


3.3. Other  Applications 

3.3.1.  Blind  Deconvolution 

One  of  the  original  goals  was  to  develop  a  deconvolution  model  that  would  provide  estimates  of  the 
periodicity  of  a  cyclostationary  process. 

The  model  used  was  a  distribution  shown  in  equation  3.3.  The  equation  describes  a  square  pulse 
modulated  by  a  cyclostationary  signal  with  an  offset.  The  cyclostationary  signal  is  created  with  a  raised 
cosine  filter. 
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3.3 


The  idea  was  to  model  a  cyclostationary  signal  as  distribution  with  parameters  /?,  L  and  T.  First,  a  raised 
cosine  function  is  created.  The  response  adds  an  offset.  The  signal  is  modulated  with  a  square  pulse. 
(Figure  13)  The  separation  between  the  impulses  was  chosen  to  be  the  same  as  the  distance  between 
the  peak  of  the  filter  response  and  the  null.  This  condition  is  required  for  interpolating  data  points 
between  impulses. 


Figure  13  Cyclostationary  signal  and  square  pulse 
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Figure  14  Mixtures  for  deconvolution  and  signal  with  offset 


Aproximation  of  a  baseband  signal  using 
sine-modulated  N=4  Gaussian  densities 


Figure  15  Synthesis  of  a  cyclostationary  signal  with  offset 

The  cyclostationary  signal  is  transformed  into  a  mix  of  density  functions.  (Figure  15)  The  approach  has 
problems  that  were  found  to  be  impossible  to  overcome.  The  mixing  of  mixtures  creates  a  signal  that  is 
extremely  difficult  to  separate.  For  instance,  the  proximity  of  two  distributions  can  cause  the  algorithm 
to  converge  to  a  local  maximum  instead  of  the  absolute  maximum.  This  is  a  weakness  of  all  the 
algorithms  based  on  Expectation  Maximization.  Second  and  probably  the  most  critical  consideration  is 
given  by  equation  2-57.  The  algorithm  is  an  iterative  process  that  reduces  the  divergence  between  the 
red  trace  and  the  blue  trace  in  Figure  15;  however,  there  is  no  guarantee  that  the  minimum  divergence 
produces  the  optimal  parameters  for  this  distribution.  The  idea  was  abandoned  due  to  these  two 
difficulties. 
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3.3.2.  Image  Detection 


An  image  can  be  represented  as  a  mixture  of  densities.  So  one  natural  question  would  be  whether  the 
Spectral  Mixture  approach  can  use  as  an  image  detector. 

The  development  is  not  so  different  from  the  development  of  the  affine  transform  model  shown  in 
equations  2.19  through  2.20.  The  development  produces  factors  that  depend  on  the  gradients.  This  is 
interesting  because  gradients  are  used  in  image  processing  for  shape  detection.  Other  than  a 
mathematical  curiosity,  the  scheme  fails  due  to  the  presence  of  multiple  local  maxima  and  the  number 
of  degrees  of  freedom,  i.e.,  the  dimension  of  the  parameter  vector.  This  idea  was  abandoned  due  to 
these  two  difficulties. 


3.3.3.  ABC  Process 

The  Adjustable  Bandwidth  Concept  (ABC)  is  a  methodology  developed  by  [7]  that  can  be  applied  for  the 
detection  of  wideband  signals.  The  method  process  creates  averages  and  high  frequency  versions 
(wavelet  decomposition)  in  the  frequency  spectrum.  This  research  explored  the  possibility  of  combining 
both  approaches. 

The  ABC  algorithm  performs  much  better  with  another  algorithm  called  Connective  Components.  The 
latter  algorithm  works  very  efficiently  with  the  binary  data  produced  by  the  ABC  method.  We  discarded 
using  the  Spectral  Mixture  for  this  application  due  to  the  computational  efficiency  of  the  Connective 
Component. 
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4.  Conclusions 


In  report  AFRL-RI-RS-TR-2008-266,  it  was  noticed  that  raising  the  likelihood  to  an  integer  power  S(f) 
was  equivalent  to  adding  S(f)  samples  with  the  same  value.  This  is  true  in  both:  the  maximum 
likelihood  approach  and  the  Expectation  Maximization  method.  This  fact  was  use  to  develop  an 
inference  method  for  histograms.  The  method  was  referred  as  Spectral  Mixtures. 

In  this  report,  the  theory  of  Maximum  Likelihood  and  Expectation  Maximization  methods  was  reviewed. 
For  developing  the  Expectation  Maximization  method,  we  took  advantage  of  the  posterior  density  from 
equation  2.5  to  produce  an  average  over  a  modified  likelihood  given  by  equation  2.7.  The  maximization 
process  resulted  in  the  Expectation  Maximization  algorithm. 

The  development  of  the  Spectral  Mixture  Model  considered  the  quantization  of  the  sample  space  in 
histogram  bins  as  shown  in  equation  2.11.  There  is  a  variable  associated  with  this  quantization:  this  is 
the  histogram  count  S(f).  This  quantity  was  constrained  to  the  set  of  positive  integers,  but  this 
constraint  can  be  relaxed  to  any  positive  real  number.  The  quantity  S(f)  can  be  expressed  as  a 
probability  distribution,  which  makes  possible  to  reformulate  the  Spectral  Mixture  Model  approach  as  a 
minimization  of  the  divergence  of  two  distributions.  The  new  algorithm  takes  the  form  of  equation  2.58 
where  6  is  the  parameter  vector 

@new  —  arg  min  DKL  (q{x,  z;  80ld)  \  |  p(x,  z;  0))  4.1 

and  q{x,  z;  80ld )  and  p(x,  z;  0)  are  given  by: 

q(x,z;90ld)  =  ^  p(x|z;  0oW)p(x) 

p(x,z;0)  =  ^p(x|z;6»)p(z). 

p(xm)  =  S(xm)/J  S(x)dV 

The  Spectral  Mixture  algorithm  was  applied  to  the  problem  of  spectral  sensing  of  communication  signals 
and  speech  signals.  In  both  cases,  the  amplitude  of  the  spectrum  is  considered  while  the  angle  is 
neglected.  The  method  works  fine  for  cases  where  no  or  little  noise  floor  is  present.  In  order  to 
overcome  this  difficulty,  one  of  the  mixtures  is  forced  to  have  constant  variance.  The  variance  is  made 
large  enough,  so  the  mixture  converges  to  the  noise  floor  as  shown  in  Figure  8.  Clipped  Gaussian 
densities  were  used  for  simplification  purposes.  The  log-likelihood  of  such  distributions  is  reduced  to  a 
polynomial  expression  and  the  maximization  process  becomes  less  complex.  Sometimes,  we  get  closed 
form  solutions,  but  this  is  not  always  the  case. 
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The  Spectral  Mixture  was  applied  to  deconvolution  and  image  processing.  It  was  found  that  the  Spectral 
Mixture  approach  inherits  all  the  weaknesses  of  the  Expectation  Maximization.  One  weakness  is  the 
resolution  of  two  mixtures  with  a  small  separation.  The  algorithm  tends  interpret  both  mixtures  as  one 
mixture.  The  other  problem  is  the  convergence  to  a  local  maximum.  Sometimes,  the  local  maximum 
solution  does  not  represent  the  best  approximation.  The  third  problem  is  the  degrees  of  freedom. 
Adding  more  degrees  of  freedom,  i.e.,  adding  more  parameters  makes  the  problem  more  difficult  to 
solve  due  to  the  increase  of  complexity. 

As  a  final  conclusion,  equation  4.1  is  a  generalization  of  the  Expectation  Maximization  algorithm.  Under 
special  cases,  the  algorithm  is  reduced  to  K-Mean  and  Parzen  window. 
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